On the Comparison of Line Spectral Frequencies and Mel-Frequency Cepstral Coefficients Using Feedforward Neural Network for Language Identification
نویسنده
چکیده
Received Jan 3, 2018 Revised Mar 5, 2018 Accepted Mar 23, 2018 Of the many audio features available, this paper focuses on the comparison of two most popular features, i.e. line spectral frequencies (LSF) and MelFrequency Cepstral Coefficients. We trained a feedforward neural network with various hidden layers and number of hidden nodes to identify five different languages, i.e. Arabic, Chinese, English, Korean, and Malay. LSF, MFCC, and combination of both features were extracted as the feature vectors. Systematic experiments have been conducted to find the optimum parameters, i.e. sampling frequency, frame size, model order, and structure of neural network. The recognition rate per frame was converted to recognition rate per audio file using majority voting. On average, the recognition rate for LSF, MFCC, and combination of both features are 96%, 92%, and 96%, respectively. Therefore, LSF is the most suitable features to be utilized for language identification using feedforward neural network classifier.
منابع مشابه
On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification
Conventional Speaker Identification (SI) systems utilise spectral features like Mel-Frequency Cepstral Coefficients (MFCC) or Perceptual Linear Prediction (PLP) as a frontend module. Line Spectral pairs Frequencies (LSF) are popular alternative representation of Linear Prediction Coefficients (LPC). In this paper, an investigation is carried out to extract LSF from perceptually modified speech....
متن کاملCombining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کاملSignificance of formants from difference spectrum for speaker identification
In this paper, we describe a prototype speaker identification system using auto-associative neural network (AANN) and formant features. Our experiments demonstrate that formants extracted from difference spectrum perform significantly better than formants extracted from normal spectrum for the task of speaker identification. We also demonstrate that formants from difference spectrum provide com...
متن کاملTwo Stage Neural Network model for Recognition of Indian Languages from Speech
India is a multilingual country. Officially about 20 languages are recognized by the government and there are about 500 languages spoken at different parts of the country. For developing the speech systems in Indian context, it is necessary to capture the language specific knowledge automatically from speech. Further it may be exploited for different speech tasks such as language identification...
متن کاملFormant Estimation and Tracking Using Deep Learning
Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the former task the input is a stationary speech segment such as the middle part of a vowel and the goal is to estimate the formant frequencies, whereas in the latter task the input is a series of speech frames and the goal is to track the trajectory of the formant frequencies throughout t...
متن کامل